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ABSTRACT 

Multi-sample  cluster  analysis,  the  problem  of  grouping  samples.  Is  studied 
from  an  information-theoretic  viewpoint  via  Akaike's  Information  Criterion 
(AIC).  This  criterion  combines  the  maximum  value  of  the  likelihood  with  the 
number  of  parameters  used  In  achieving  that  value.  The  multi-sample  cluster 
problem  Is  defined,  and  AIC  Is  developed  for  this  problem.  The  form  of  AIC  is 
derived  In  the  univariate  model  with  varying  means  and  variances,  and  in  the 
multivariate  model  with  varying  mean  vectors  and  variance-covariance  matrices. 
Numerical  examples  are  presented  and  results  are  shown  to  demonstrate  the 
utility  of  AIC  In  Identifying  the  best  clustering  alternatives. 

Key  Words  and  Phrases:  Multi-sample  cluster  analysis;  Akaike's  Information 
Criterion  (AIC);  Univariate  model  with  varying  means  and  variances. 
Multivariate  model  with  varying  mean  vectors  and  variance-covariance 
matrices;  maximum  likelihood. 
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1.  Introduction 

In  a  previous  paper,  we  Introduced  and  developed  Akalke's  Information 
Criterion  (AIC)  for  multi-sample  cluster  analysis,  the  problem  of  grouping 
samples.  We  derived  the  form  of  AIC  in  both  univariate  and  multivariate 
analysis  of  variance  models,  where  the  assumptions  of  Independence,  univariate 
and  multivariate  normality,  equal  variances  and  variance-covariance  matrices 
were  fundamental  to  the  analysis.  We  gave  numerical  examples  and  results  which 
demonstrated  the  utility  of  AIC  In  Identifying  the  best  clustering  alterna¬ 
tives.  (See,  Bozdogan  and  Sclove  [5]). 

In  this  paper,  we  shall  continue  to  study  the  multi-sample  cluster 
problem.  However,  here  we  shall  develop  Akalke's  Information  Criterion  (AIC) 
for  multi-sample  cluster  analysis  with  varying  means  and  variances  In  the  uni¬ 
variate  model,  and  with  varying  mean  vectors  and  variance-covariance  matrices 
in  the  multivariate  model,  since  often  In  practice  the  assumption  of  equal 
parameters  within  the  model  is  a  rather  dubious  requirement. 

Many  practical  situations  require  the  presentation  of  multivariate  data 
from  several  structured  samples  for  comparative  purposes  and  the  grouping  of 
the  heterogeneous  samples  Into  homogeneous  sets  of  samples  In  which  parameters 
might  vary.  For  this  reason  It  is  reasonable  to  provide  a  practically  useful 
statistical  procedure  that  would  use  some  sort  of  statistical  model  to  aid  in 
comparisons  of  various  collections  of  samples,  Identify  homogeneous  groups  of 
samples,  and  tell  us  which  should  be  clustered  together  and  which  samples 

♦Presented  by  the  first  author  as  an  Invited  Paper,  Special  Session  on 
Cluster  Analysis,  789th  Meeting,  American  Mathematical  Society,  University  of 
Massachusetts,  Amherst,  MA,  October  16-18,  1981. 
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should  not.  Fop  examples  of  multi-sample  clustering  situations,  we  refer  the 
reader  to  Bozdogan  and  Sclove  [5]. 

In  the  statistical  literature  several  conventional  test  procedures  are 
available  for  testing  whether  or  not  several  populations  have  equal  variances, 
as  required  by  the  analysis  of  variance  (ANOVA)  model.  If  we  have  a  reason  to 
doubt  this  Is  the  case,  then  we  may  first  want  to  test  the  equality  of 
variances.  In  the  multivariate  case  the  equality  of  covariance  matrices  is 
certainly  more  hazardous.  Therefore,  for  this  reason  we  may  want  first  to 
test  the  equality  of  variances  in  the  univariate  case,  and  the  equality  of 
covariance  matrices  In  the  multivariate  case.  This  Is  an  Important  option  to 
use  In  clustering  groups  or  samples,  and  In  general. 

In  the  literature,  however,  there  are  several  test  procedures  for  testing 
the  equality  of  variances,  and  covariance  matrices.  For  example.  In  the 
multivariate  case,  the  most  commonly  used  test  Is  Box's  M  test  despite  the 
fact  that  it  Is  very  expensive  to  compute  It  on  a  high  speed  computer,  even  on 
an  IBM  370.  Moreover,  as  In  the  case  of  MANOVA,  these  test  procedures  are  not 
revealing  or  Informative  In  multi-sample  clustering  problems.  Therefore,  in 
this  paper  we  shall  propose  again  Akalke's  Information  Criterion  (AIC)  as  a 
new  procedure  for  comparing  the  clusters,  and  use  It  to  Identify  the  best 
clustering  alternatives. 

In  1971,  Akalke  first  Introduced  an  Information  criterion,  referred  to  as 
an  Automatic  (Model)  Identification  Criterion  or  Akalke's  Information 
Criterion  (AIC),  for  the  Identification  and  comparison  of  statistical  models 
In  a  class  of  competing  models  with  different  numbers  of  parameters.  It  Is 
defined  by 
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(1.1)  AIC  *  -2  loge  (maximized  likelihood) 

+2  (number  of  Independently  adjusted  parameters  within  the  model). 
It  was  obtained  with  the  aid  of  an  Information  theoretic  Interpretation  of  the 
method  of  maximum  likelihood  by  Akaike  ([2],  [3]).  It  estimates  minus  twice 
the  expected  log  likelihood  of  the  model  whose  parameters  are  determined  by  the 
method  of  maximum  likelihood.  When  several  competing  models  are  being  compared 
or  fitted,  AIC  Is  a  simple  procedure  which  measures  the  badness  of  fit  or  the 
discrepancy  of  the  estimated  model  from  the  true  model  when  a  set  of  data  Is 
given. 

The  first  term  In  (1.1)  stands  for  the  penalty  of  badness  of  fit  or 
downward  bias  when  the  maximum  likelihood  estimators  of  the  parameters  of  the 
model  are  used.  The  second  term  In  the  definition  of  AIC,  on  the  other  hand, 
stands  for  the  penalty  of  Increased  unreliability  or  compensation  for  the  bias 
In  the  first  term  as  a  consequence  of  Increasing  number  of  parameters.  If  more 
parameters  are  used  to  describe  the  data.  It  Is  natural  to  get  a  larger 
likelihood,  possibly  without  Improving  the  true  goodness  of  fit.  The  AIC 
avoids  this  spurious  Improvement  of  fit  by  penalizing  the  use  of  additional 
parameters. 

Thus,  when  there  are  several  competing  models,  the  parameters  within  the 
models  are  estimated  by  the  method  of  maximum  likelihood  and  the  AlC-values  are 
computed  and  compared  to  find  a  model  with  the  minimum  value  of  AIC.  This 
procedure  Is  called  the  minimum  AIC  procedure.  The  model  with  the  minimum  AIC 
Is  called  the  minimum  AIC  estimate  (MAICE)  and  Is  designated  as  the  best  model . 

In  Section  2,  we  shall  define  the  general  multi-sample  cluster  problem, 
and  In  Section  3,  we  shall  briefly  discuss  the  number  of  clustering  alterna¬ 
tives  for  a  given  K  groups  or  samples  Into  k  nonempty  clusters.  In  the 
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subsequent  sections,  that  Is,  in  Section  4  and  in  5,  we  shall  derive  the  AIC 
procedure  for  the  univariate  model  with  varying  means  and  variances,  and  for 
the  multivariate  model  with  varying  mean  vectors  and  covariance  matrices.  In 
Section  6,  we  shall  give  numerical  examples  for  both  univariate  and  multi¬ 
variate  multi-sample  cluster  analysis  on  the  same  real  data  set  to  demonstrate 
our  results  of  AIC  and  minimum  AIC  procedures  obtained  from  different  computer 
analyses. 

2.  The  Multi-Sample  Cluster  Problem 

Suppose  each  Individual,  object,  or  case,  has  been  measured  on  p  response 
or  outcome  measures  (dependent  variables)  simultaneously  In  K  independent 
groups  or  samples  (factor  levels).  Let 


(2.1) 


X  (n  x  p) 


be  a  single  data  matrix  of  K  groups  or  samples,  where  (ngxp)  represents  the 

K 

observations  from  the  g-th  group  or  sample,  g=l,2,...,K,  and  n  ■  T  na.  The 

9-1 

goal  of  cluster  analysis  Is  to  put  the  K  groups  or  samples  Into  k  homogeneous 
groups,  samples,  or  classes  where  k  Is  unknown,  but  k£K. 

Often  Individuals  or  objects  have  been  sampled  from  K>1  populations.  For 
multi-samples  or  multiple  groups  of  Individuals  or  objects  the  data  matrix  may 
be  represented  In  partitioned  form  as  above.  Let  ng  represent  the  number  of 
Individuals  In  the  g-th  (random)  sample,  g-1,2 . K.  The  ng  are  not 
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restrlcted  to  being  equal  or  proportional  to  other  na's.  The  total  number  of 

K 

observations  is  n  *  T  na.  Let  XQ<  be  the  pxl  vector  of  observations  In  group 

g»l  y 

g»l,2,...,K,  and  for  individual  1*1,2,. ..,ng. 

3.  The  Number  of  Clustering  Alternatives  for  a  Given  K 
Samples  into  k~ Nonempty  Clusters 

In  this  section,  we  shall  just  briefly  mention  how  to  obtain  the  total 
number  of  clustering  alternatives  for  a  given  K,  the  number  of  groups  or 
samples.  For  details  we  again  refer  the  reader  to  Bozdogan  and  Sclove  [5]. 

In  general ,  the  total  number  of  ways  of  clustering  K  groups  or  samples 
into  k  clusters  is  given  by 

(3.1)  S(K,k )  l  (^(-l)9(k-g)K 

*  g=0 

which  is  known  as  the  Stirling  Number  of  the  Second  Kind  (see,  e.g.,  Abramowitz 
and  Stegun  [1])  and  also  called  the  number  of  clustering  alternatives. 

If  k,  the  number  of  clusters  of  groups  or  samples,  is  known  in  advance, 
then  the  total  number  of  clustering  alternatives  is  given  by  S(K,k).  However, 
If  k  is  not  specified  a  priori  and  varies,  then  the  total  number  of  clustering 
alternatives  for  a  given  K,  the  number  of  groups  or  samples.  Is  given  by 

K 

(3.2)  l  $(K,k)  . 

k«l 

For  example,  K=4  samples,  if  k  Is  not  specified  a  priori  and  varies,  then 
there  are  in  total  15  possible  clustering  alternatives  to  cluster  K*4  groups  or 
samples  first  Into  k*4  groups  or  samples,  then  k*3  groups  or  samples,  k*2 
groups  or  samples,  and  k*l  group  or  sample  by  using  the  equation  (3.1) 
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respecti vely,  and  summing  the  results  by  using  the  expression  (3.2)  to  obtain 
15  as  the  total  number  of  possible  clustering  alternatives. 

Therefore,  the  total  number  of  ways  of  clustering  K  groups  or  samples  into 
k  homogeneous  groups  or  samples  is  given  by  equation  (3.1),  and  the  total 
number  of  possible  clustering  alternatives  is  given  by  the  expression  (3.2). 

4.  AIC  For  The  Univariate  Model 

We  now  turn  our  attention  to  consider  situations  with  several  univariate 

normal  samples.  For  example,  we  may  have  multi-sample  data  with  samples  of 

sizes  n1,n2,...,n  which  are  assumed  to  have  come  from  K  populations,  the  first 

with  mean  y1  and  variance  a2,  the  second  with  mean  u2  and  variance  <j2,...,  the 

Kth  with  mean  p  and  variance  a2.  We  may  want  to  decide  in  this  case  if  the 
K  K 

variances  of  these  K  samples  will  be  treated  as  equal  or  not,  given  no 

restriction  on  the  population  means.  In  terms  of  the  parameters  the  univariate 

2  2  2 

model  Is  9  =  (v, ,y2,...,u  ,a,,o2,...,o  )  with  m=2k  parameters,  where  k  Is  the 
—  K  K 

number  of  groups. 

Recall  the  definition  of  AIC  from  Section  1, 

AIC  *  -2  loge  L(e)  +  2m 

*  -2  loge  (maximized  likelihood)  +  2m  , 

where  m  denotes  the  number  of  independently  adjusted  parameters  within  the 
model . 

Suppose  there  are  K  Independent  samples  of  Independent  observations,  with 

K 

ng,  g-1,2 . K,  observations  In  the  g-th  group  and  n  »  J  ng.  Denote  the 

9-1 

unknown  means  of  the  groups  by  »t>2  >•••  «u  ,  and  the  unknown  variances  of  the 

K 

2 

a.  Assume  that  the  samples  (z, ,  ,z,,  ,...,z 
K  r  11  12  inx 


.  2  2 

groups  by 


»  •  •  •  » 
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z  ,...,z  )  are  drawn  randomly  from  K  populations  which  are  N(uq,<jq).  The 

K.  Kn^  3  3 

basic  null  hypothesis  of  interest  is  given  by 


(4.1)  a\  *  *  .  .  .  =  <y 


The  alternative  hypothesis  is  given  by 
Not  all  K  variances  are  equal. 

In  the  statistical  literature,  this  is  also  known  as  the  test  of  homogeneity  of 
variances  or  Bartlett 's  test. 

To  derive  the  form  of  AIC  subject  to  this  constraint,  we  call  the  common 
unknown  value  of  variances  a2.  The  likelihood  function  in  this  case  is  given 
by 


-n/2  K  -n  /2  ^  i  ''g  z 

'  2  \  n  _ r  r  /  l  i  r*  / 


K  n 

(4.2)  L({u  ,a2>;z)  =  (2*)'"'"  H  (oJ)"V‘  exp{-  [  (X)  I’(zgl  -  Vg) 

99  g»l  9  g*l  2aQ  i=i 


The  log  likelihood  is 


(4.3)  l({u  ,o2})  =  log  L({W  ,a2);z) 
9  9  9  - 


K  k  n 

7  1og(2n)  -  1/2  V  n  log  <?  -  l  X  I9Ugi  ~  »gY 
g=l  y  y  g=l  2on  1*1 


and  the  MLE's  of  u  and  a  are  given  by 
9  9 


1  ?9 


<4-4)  V  n,  29l 


*  Zf 
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(4.5) 


°9  =  n-  ^z9i  ■  z9*  )2  “  sg  ’  9=1.2,...  ,K. 


Substituting  these  back  into  (4,3)  and  simplifying,  the  maximized  log 
likelihood  becomes 

(4.6)  2  109  L({llg,0g>;f) 


=  -  £  1og(2w )  -  1/2  l  ng  log  s2  -  i  . 

g=l  a 

Since 


(4.7)  AIC  =  -2  loge  L(e)  +  2m, 


where  m  is  the  number  of  parameters,  and  since 


(4.8)  -2  log  L ( { vi  ,a2})  =  n  log(2^)  +  [  nq  log  s2  +  n, 

9  9  g=l  9 

then  AIC  becomes 


2  2 

(4.9)  AIC  (varying  y  and  a  )  =  n  log(2ir)  +  l  na  log  s  +  n  +  2 ( 2k ) . 

g=l  9 

Since  the  constants  do  not  affect  the  result  of  comparison  of  models,  we 
could  Ignore  them  and  use  the  simplified  version 

2  K  2 

(4.10)  AIC*  *  (varying  y  and  c  )  «  Y  na  loge  S2  +  2(2k) 

g=l  9 

,  i  Qq  .2 

sa  =  T~  I  UQl  -  za  )  »  9*1.2 . K, 

9  ng  i*l  91  9* 


where 


k  =  number  of  groups  or  samples  compared,  or  the  number  of 
independently  adjusted  parameters  within  the  model. 

However,  for  purposes  of  comparison  we  retain  the  constants  and  use  AIC  given 


by  (4.9). 

5.  AIC  For  the  Multivariate  Model 

As  we  mentioned  In  Section  1,  that  the  assumption  of  equality  of  variances 
in  one-way  ANOVA,  causes  serious  problems  when  we  are  testing  the  equality  of 
several  means.  Parallel  to  this  assumption,  in  the  multivariate  case  the 
equality  of  covariance  matrices  even  causes  more  serious  problems.  For  this 
reason  we  may  want  first  to  test  the  equality  of  covariance  matrices  against  the 
alternative  that  not  all  covariance  matrices  are  equal.  Therefore,  throughout 
this  section  we  shall  suppose  that  we  may  have  independent  data  matrices 
where  the  rows  of  X^g  (ngxp)  are  independent  and  identically 
distributed  (i.l.d.)  Np(ug,£g),  g=l,2,...,K.  In  terms  of  the  parameters  the 
multivariate  model  we  shall  consider  is 

?  *  . -K*— i’^2 . 

with  m  *  kp  +  kp(p+l)/2  parameters,  where  k  is  the  number  of  groups,  and  p  is 
the  number  of  variables. 

Thus,  the  basic  null  hypothesis  we  usually  are  Interested  In  testing  is 
given  by 

(5.1)  HQ:  •  Z*  •  .  .  .  - 

The  alternative  hypothesis  Is  given  by 

Hia*  Not  all  K  covariance  matrices  are  equal. 


■r 
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In  multivariate  analysis  this  Is  known  as  the  test  of  homogeneity  of 
covariance  matrices. 

To  derive  Akaike's  Information  Criterion  (AIC)  in  this  case  the  log 
likelihood  function  is  given  by 

(5.2)  1( {jjg'jj.g}  =  1®9  ij.) 

K  K 

-  -  -2£.  1  og ( 2n )  -  1/2  l  n  log|E  I  -  1/2  l  n  trr"  A 

g»l  *  *  g*l  a  3  3 

K 

-  1/2  ][  ng(Tg  -  Jjg)'ffg  ”  Ug)  • 

9*1 

The  MLE's  of  jjg  and  £g  are 

(5.3)  JJg  *  Zg  »  9*1*2 . K, 

and 

A 

(5.4)  Eg  *  .Ag/ng. 

Substituting  these  back  Into  (5.2)  and  simplifying,  the  maximized  log 
likelihood  becomes 

A  A  A  A 

(5.5)  l((Hg»£g>;I)  5  1o9  L({yg,£g}  ;Z) 


I  "g  log  I ng  Ag|  _  ~2  • 
9*1 


-  — ^  1  og ( 2-w )  -  1/2 
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SI  nee 

A 

(5.6)  AIC  *  -2  logeL(e)  +  2m  , 

where  m  3  kp  +  kp(p+l)/2  is  the  number  of  parameters,  then  AIC  becomes 

K  . 

(5.7)  AIC(vary1ng  u  and  E_)  3  nplog(2Tr)  +  l  nglog|ng”  Ag|  +  np 

g-1 

+  2[kp  +  kp(p+l)/2]. 

Since  the  constants  do  not  affect  the  result  of  comparison  of  models,  we 
could  Ignore  them  and  reduce  the  form  of  AIC  to  a  much  simpler  form 

K 

(5.8)  AIC*(varying  u  and  Z)  3  J  nglog  |Ag|  +  2[kp  +  kp(p+l)/2], 

g»l  a  e 

where  ng  3  sample  size  of  group  or  sample  g=l,2,...K, 

|Ag|  *  the  determinant  of  sum  of  squares  and  cross-products  (SSCP) 
matrix  for  group  or  sample  g*l,2,...,K, 
k  3  number  of  groups  or  samples  compared,  and 
p  »  number  of  variables. 

However,  for  purposes  of  comparison  we  retain  the  constants  and  use  AIC  given 
by  (5.7). 

6.  Numerical  Examples  of  Multi-Sample  Cluster  Analysis  on  Fisher  Iris  Data 
In  this  section  we  shall  give  numerical  examples  of  both  univariate  and 
multivariate  multi-sample  data,  and  cluster  the  groups  or  samples,  and  choose 
the  best  clusterings  by  using  Akalke's  Information  Criterion  (AIC)  as  derived 
In  Sections  4  and  5. 

Our  computations  were  carried  out  for  all  the  examples  we  shall  present 
here  on  an  IBM  370,  using  various  statistical  software  packages  such  as 


MINITAB,  SPSS,  and  SPEAKEASY  (VM/CMS  version). 


6.1.  A  Univariate  Examples 

For  the  univariate  numerical  examples  we  shall  illustrate  our  results  on 
Fisher  [6]  Iris  data. 

Example  6.1.  Clustering  of  Irises  by  Groups:  The  iris  data  set  Is  composed 
of  150  iris  species  belonging  to  three  groups  or  species,  namely  Iris  setosa 
(S),  Iris  versicolor  (Ve),  and  Iris  virginlca  (Vi)  measured  on  sepal  and  petal 
length  and  width.  Each  group  Is  represented  by  50  plants.  The  data  set  for 
the  150  irises  are  given  In  Table  6.1. 

This  data  set  has  been  quite  extensively  studied  in  classification  and 
cluster  analysis  since  it  was  published  by  Fisher  [6],  and  still  today,  is 
being  used  as  a  "testing  ground"  for  classification  and  clustering  methods 
proposed  by  many  Investigators  such  as  Friedman  and  Rubin  [7],  Kendall  [8], 
Solomon  [10],  Mezzich  and  Solomon  [9],  and  many  others,  including  the  present 
authors. 

For  each  of  the  150  plants  we  already  know  the  group  structure  of  the 
iris  species,  namely  K=3  groups  or  samples.  Even  though  the  two  species.  Iris 
setosa  and  Iris  versicolor  were  found  growing  In  the  same  colony,  and  Iris 
virginlca  was  found  growing  in  a  different  colony,  Fisher  reports  in  his 
linear  discriminant  analysis  the  separation  of  I.  setosa  completely  from  I. 
versicolor  and  I.  virginlca.  Since  then  other  Investigators  have  shown 
similar  results  In  their  studies  such  as  the  ones  we  mentioned  above. 

With  this  In  mind,  let  us  take  K*3  groups  or  species  on  each  of  the 
variables  separately  and  cluster  them  Into  k*l,2,  and  3  homogeneous  groups. 
Since  we  are  dealing  with  K»3  groups,  by  using  equation  (3.1)  and  the 
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TABLE  6.1  MEASUREMENTS  ON  THREE  TYPES  OF  IRIS 


IxLi  4&&740, 


VUa  vvuic atox 


VU*  'J4A.QjjU.C3. 


Sepal  Sepal  Petal  Petal  Sepal  Sepal  Petal  Petal  Sepal  Sepal  Petal  Petal 
length  wiath  length  width  length  width  length  width  length  width  length  width 
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expression  (3.2)  In  Section  3,  we  obtain  In  total  5  possible  clustering  alterna¬ 
tives.  Denoting  I.  setosa  by  S,  I.  versicolor  by  Ve,  and  I.  virginica  by  Vi,  we 
have  (S)  (Ve)  (VI),  (S,  Ve)  (VI),  (S,  VI)  (Ve),  (Ve,  VI)  (S),  and  (S,  Ve,  VI)  as 
all  the  possible  clustering  alternatives  of  three  Iris  species.  In  terms  of  the 

parameters,  using  the  univariate  model  e  »  (m^m, . .  . <*2)  as  our 

underlying  model  with  varying  means  and  variances  for  clustering  the  iris  groups, 
from  a  simple  run  on  the  computer  by  ising  MINITAB  package,  we  obtained  the  AIC's 
for  each  of  the  5  clustering  alternatives  of  each  of  the  four  variables  separate¬ 
ly.  We  report  our  results  on  each  of  the  four  variables,  respectively,  as 
f ol 1 ows . 


TABLE  6.2.  THE  AIC'S  FOR  IRISES  BY  GROUPS  ON  VARIABLE  SEPAL  LENGTH 


Alternative 

Clustering 

nloge(2ir) 

K  2 

I  nglog0Sg 

.  g«i  _  ..  .. 

n 

0 

AIC 

1 

(S)  (Ve)  (VI) 

275.681 

-218.710 

I 

3 

n 

218.9718 

2 

(S,  Ve)  (Vi) 

275.681 

-136.019 

m 

2 

■tl 

297.662 

3 

S,  VI)  (Ve 

275.681 

-  79.394 

150 

2 

8 

354.287 

4 

(Ve,  VI)  (S) 

275.681 

-188.536 

150 

2 

8 

245.145b 

5 

(S,  Ve,  VI) 

275.681 

-  57.603 

150 

1 

4 

372.078 

TABLE 

6.3.  THE  AIC'S 

FOR  IRISES  BY  GROUPS  ON  VARIABLE  SEPAL  WIDTH 

Alternative 

Clustering 

nloge(2ir) 

K  2 

I  nglogeSg 

9*1 

0 

0 

AIC 

1 

(S)  (Ve)  (VI) 

275.681 

-329.102 

ER 

3 

12 

108.579a 

2 

(S,  Ve)  (VI) 

275.681 

-262.503 

■si 

2 

8 

171.178 

3 

275.681 

-292.416 

150 

2 

8 

141.265 

4 

275.681 

-319.093 

150 

2 

8 

114. 588® 

5 

(S,  Ve,  VI) 

275.681 

-250.132 

150 

l 

1 

4 

l 

179.549 

TABLE  6.4.  THE  AIC'S  FOR  IRISES  BY  GROUPS  ON  VARIABLE  PETAL  LENGTH 


K  2 

Alternative 

Clustering 

nloge(2w) 

I  nglogeSg 

n 

k 

2  (2k) 

AIC 

1 

(S)  (Ve)  (VI) 

275.681 

-313.055 

150 

3 

12 

124.6263 

2 

(S,  Ve)  (VI) 

275.681 

12.795 

150 

2 

8 

446.476 

3 

(S,  VI)  (Ve) 

275.681 

70.394 

150 

2 

8 

504.075 

4 

(Ve,  VI)  (S) 

275.681 

-215.414 

150 

2 

8 

218.267b 

5 

(S,  Ve,  VI) 

275.681 

169.888 

150 

1 

4 

599.569 

TABLE 

6.5.  THE  AIC’S 

FOR  IRISES  BY  GROUPS  ON  VARIABLE  PETAL  WIDTH 

K  2 

Alternative 

Clustering 

nloge(2n) 

I  nglogeSg 
9*1 

n 

k 

2  ( 2k ) 

AIC 

1 

(S)  (Ve)  (VI) 

275.681 

-519.344 

150 

3 

12 

-81.6638 

2 

(S,  Ve)  (Vi) 

275.681 

-245.374 

150 

2 

8 

188.307 

3 

(S,  Vi)  (Ve) 

275.681 

-181.176 

150 

2 

8 

252.505 

4 

(Ve,  VI)  (S) 

275.681 

-398.271 

150 

2 

8 

35.410b 

5  ! 

(S,  Ve,  VI) 

275.681 

-  82.454 

150 

1 

4 

347.227 

i 

“K  2 


AIC (varying  u  and  o) 
aFirst  Minimum  AIC 


nloge(2n)  +  l  ng1oge  Sg  +  n  +  2(2k) 
9*1 


^Second  Minimum  AIC 


Looking  at  each  of  the  tables  above,  we  see  that  on  each  of  the  variables 
the  first  minimum  AIC  occurs  at  the  alternative  submodel  1,  namely  (S)  (Ve)  (Vi). 
That  Is,  the  MAICE  Is  submodel  1  Indicating  that  Indeed  there  are  three  types  of 
species  across  all  the  variables.  But  the  second  minimum  AIC  1$  at  the  alterna¬ 
tive  submodel  4  again  across  all  the  variables  Indicating  that  If  we  were  to 
cluster  any  Iris  species,  we  should  cluster  I.  versicolor  and  I.  vlrqlnlca 
together,  as  one  homogeneous  group. 

Thus  our  minimum  AIC  results  for  each  of  the  variables  confirm  other  Investl 
gators'  findings.  Including  Fisher's  results  on  the  Iris  data.  Moreover,  If  we 
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were  to  choose  among  the  submodels  then  we  would  choose  the  one  with  smallest 
minimum  AIC  as  the  best  submodel.  Examining  the  Tables  6.2,  6.3,  6.4,  and  6.5, 
we  see  that  the  smallest  minimum  AIC  occurs  at  the  submodel  1  In  Table  6.5  on 
variable  petal  width.  This  Indicates  that  petal  width  alone  separates  the 
three  Iris  species  with  virtual  certainty,  confirming  again  Fisher's  results 
(see,  e.g.,  Fisher  [6]). 

Hence,  we  note  here  that  we  are  clustering  the  Irises  by  groups  or  species 
under  a  more  general  model  rather  than  using  the  ANOVA  model  as  our  underlying 
model  which  we  cosldered  In  a  previous  paper  on  multi-sample  cluster  analysis. 

6.2.  A  Multivariate  Example 

Now  we  consider  Fisher  Iris  data  again  and  this  time  we  cluster  K*3  groups 
or  species  Into  k=l,  2,  and  3  homogeneous  groups  on  the  basis  of  all  the  four 
variables,  assuming  the  multivariate  model  given  In  terms  of  the  parameters 

e  *  (y  ,u  . r  ,£ . .  )  as  the  underlying  model  for  clustering  these  three 

Iris  groups  or  species.  On  the  Iris  data,  running  SPSS  MANOVA  program,  we 
obtain  the  following  sum  of  squares  and  products  (SSCP)  matrices  for  each  of 
the  clustering  alternatives.  These  are: 

(1)  (S)  (VE)  (VI) 


6.0882 

4.8616 

.8014 

.5062 

4.8616 

7.0408 

.5732 

.4556 

-l 

| 50  A  J  ■  1.949E-6 

.8014 

.5732 

1.4778 

.2974 

“(S)' 

loge  (1.949E-6)  -  -13.148 

.5062 

.4556 

.2974 

.5442 
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A,  * 
"(VE) 


13.561 

4.362 

9.066 

2.7436 

4.362 

4.825 

4.05 

2.019 

9.066 

4.05 

10.82 

3.582 

2.7436 

2.019 

3.582 

1.9162 

__ 

-l 

I50  ^(VE)l  *  1*8053E-5 

loge  (1.8053E-5)  =»  -10.922 


V) 


19.813 

4.5944 

14.861 

2.4056 

4.5944 

5.0962 

3.4976 

2.3338 

14.861 

3.4976 

14.925 

2.3924 

2.4056 

2.3338 

2.3924 

3.6962 

I  so  A 


1.2244E-4 


(VI) 

loge  (1.2244E-4)  *  -9.0079 


(2)  (S,  VE)  (VI) 


A, 

tS,  VE) 


40.901  -5.9433  74.361  28.144 
-5.9433  22.69  -41.404  -15.291 
74.361  -41.404  208.02  79.425 


-l 

1 100  A 


(S,  VE) 


|  =■  3.3118E-4 


(3)  (S,  VI)  (VE) 


A. 

~(S,  VI) 


28.144  • 

•15.291 

79.425  31 

.62  | 

88.469 

-8.4997 

177.42 

73.311 

-8.4997 

17.29 

-42.351 

-17.414 

177.42 

-42.351 

434.61 

184.69 

73.311 

-17.414 

184.69 

83.45 

1 100  A,,.  I 
1  -(S,  VI)1 

loge  (.0025193) 


.0025193 

-5.9838 


-18- 


(4)  (VE,  VI)  (S) 


44.264 

12.322 

45.245 

16. 

699 

12.322 

10.962 

14.137 

7. 

9228 

_ 

■  l 

A 

s 

1 100 

A  |  • 

■  3.1476E-4 

"(VE, 

VI) 

45.245 

14.137 

67.476 

28. 

584 

-(VE,  VI)' 

loge 

(3.1476E-4)  • 

‘  -8.0637 

16.699 

7.9228 

28.584 

17. 

862 

(S,  VE, 

VI) 

102.6 

-6.0197 

189.78 

76 

.884 

!  -6.0197 

28.307 

-49.119 

-18 

.124 

■  l 

A 

M 

1 150 

A,  1 

-  .0018787 

“(S,  VE, 

VI) 

189.78 

-49.119 

464.33 

193 

.05 

— (S,VE, VI ) 

1 

loge 

(.0018787)  * 

-6.2772 

76.884 

-18.124 

193.05 

86 

.57  1 

After  carrying  out  all  our  computations  for  each  of  the  clustering  alternatives 
(using  the  Matrix  Algebra  Routines  In  the  SPEAKEASY  interactive  computer  package),  we 
obtain  the  AIC's  from  (5.7).  The  results  are  shown  in  Table  6.6. 


TABLE  6.6.  THE  AIC'S  FOR  IRISES  BY  GROUPS  ON  ALL  VARIABLES 


Alternative 

Clustering 

nploge(2ir) 

K  -l 

I  ng1oge|ng  ^g| 
g-i 

np 

H 

2m 

AIC 

1 

(S)  (Ve)  (VI) 

1,102.724 

-1,653.89  5 

600 

84 

132.829a 

2 

(S,  Ve)  (VI) 

1,102.724 

-1,251.675 

600 

56 

507.049 

3 

S,  VI)  (Ve) 

1,102.724 

-1,144.480 

600 

56 

614.244 

4 

(Ve,  VI)  (S) 

1,102.724 

-1,463.770 

600 

56 

294.954“ 

5 

(S,  Ve,  VI) 

1,102.724 

-  941.580 

600 

1 

28 

789.144 

n  «  150  plants,  p  *  4  variables 
m  *  kp  +  kp(p+l)/2  parameters 
AIC (varying  y  and  I )  *  nploge  (2ir)  + 
aF1rst  Minimum  AIC 
^Second  Minimum  AIC 


K 

I 

g-1 


-1 

ngloge|ng  Ag| 


+  np  +  2m 
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Hence,  looking  at  the  Table  6.6,  we  see  that,  using  all  four  variables 
simultaneously  the  first  minimum  AIC  occurs  at  the  alternative  submodel  1, 
that  is,  when  (S)  (Ve)  (VI)  are  all  clustered  separately.  This  Indicates 
again  that  Indeed  there  are  three  types  of  species.  Therefore,  the  MAICE  Is 
submodel  1.  Not  surprisingly,  the  second  minimum  AIC  occurs  at  the 
alternative  submodel  4  telling  us  that  If  we  were  to  cluster  any  one  of  the 
two  Iris  groups,  we  should  cluster  I.  verlscolor  and  I.  virglnlca  together  as 
one  homogeneous  group,  and  we  should  cluster  I.  setosa  completely  separate  as 
one  heterogeneous  group. 

Here,  It  Is  Important  to  note  that  we  obtained  also  the  same  results  when 
we  used  the  four  variables  separately  In  our  computation  of  AIC  in  the 
previous  section,  which  Is  encouraging. 

7.  Conclusions  and  Discussion 

From  our  numerical  results  in  Section  6,  we  see  that  AIC  and  consequently 
minimum  AIC  procedures  can  successfully  indeed  identify  the  best  clustering 
alternatives  when  we  cluster  samples  Into  homogeneous  sets  of  samples  both  In 
the  univariate  and  the  multivariate  models  with  varying  parameters. 

In  our  previous  paper  on  multi-sample  cluster  analysis  (Bozdogan  and 
Sclove  [5]),  we  considered  ANOVA  and  MANOVA  as  our  two  underlying  models  where 
the  assumption  of  equal  variances  and  covariances  were  used  to  cluster  the 
groups  or  samples  for  multi-sample  data.  There,  we  used  AIC  also  In 
Identifying  the  best  clustering  alternatives  In  clustering  the  Iris  groups  or 
species.  We  obtained  the  same  results  In  determining  the  three  types  of  Iris 
species  and  that  If  we  were  to  cluster  any  one  of  the  two  Iris  groups,  we 
should  cluster  I.  versicolor  and  I.  virglnlca  together  as  one  homogeneous 


group,  and  we  should  cluster  I.  setosa  completely  separate  as  one 
heterogeneous  group. 

In  summarizing  the  results  of  AlC-values  for  the  multivariate  case  only 
from  the  previous  and  this  paper,  we  obtain  the  following  table. 


TABLE  6.7.  THE  AIC'S  FOR  IRISES  BY  GROUPS  ON  ALL 
VARIABLES  UNDER  TWO  MULTIVARIATE  MODELS 


Alternatl ve 

Clustering 

AIC (varying  u  and  z) 

AIC (common  z) 

1 

(S)  (Ve)  (VI) 

132.829* 

242.524* 

2 

(S,  Ve)  (VI) 

507.049 

652.824 

3 

(S,  VI)  (Ve) 

614.244. 

750.334 

4 

(Ve,  VI)  (S) 

294.954b 

439.124b 

5 

(S,  Ve,  VI) 

789.144 

788.994 

aF1rst  Minimum  AIC 


bSecond  Minimum  AIC 


Comparing  the  AIC's  In  Table  6.7  above,  we  see  that  AIC(vary1ng  y  and  z) 
values  are  much  less  than  the  AIC (common  £)  values  for  each  of  the  clustering 
alternatives  except  for  the  last  clustering  alternative  (l.e.,  alternative  5) 
In  clustering  the  iris  groups  or  species.  Since  according  to  the  definition 
of  AIC,  the  model  with  the  minimum  AIC  is  chosen  to  be  the  best  model ,  then 
the  above  results  suggest  that  when  we  are  clustering  Iris  data,  and  in 
general,  we  should  use  different  covariance  matrices  rather  than  using  equal 
covariance  matrices  In  data  analysis. 

As  we  mentioned  In  the  Introduction  of  this  paper.  In  practice  the 
assumption  of  equal  covariance  matrices  within  the  model  Is  a  rather  dubious 
requirement. 
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Thus,  In  concluding,  we  see  that  the  use  of  AIC  shows  how  to  combine  the 
information  in  the  likelihood  with  an  appropriate  function  of  the  number  of 
parameters  to  obtain  estimates  of  the  information  provided  by  competing 
alternative  models.  Therefore,  the  definition  of  MAICE  gives  a  clear 
formulation  of  the  principle  of  parsimony  in  statistical  model  building  or 
comparison  as  we  demonstrated  by  numerical  examples.  And  MAICE  provides  a 
versatile  procedure  for  statistical  model  identification  which  is  free  from 
the  ambiguities  inherent  In  the  application  of  conventional  statistical 
procedures. 


1 

l 
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Multi-sample  cluster  analysis;  Akaike's  Information  Criterion  (AIC); 
Univariate  model  with  varying  means  and  variances;  Multivariate  model 
with  varying  mean  vectors  and  variance-covariance  matrices;  maximum 
likelihood. 
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Multi-sample  cluster  analysis,  the  problem  of  grouping  samples.  Is  studied 
from  an  Information -theoretic  viewpoint  via  Akaike’s  Information  Criterion 
(AIC).  This  criterion  combines  the  maximum  value  of  the  likelihood  with 
the  number  of  parameters  used  In  achieving  that  value.  The  multi-sample 
cluster  problem  Is  defined,  and  AIC  is  developed  for  this  problem.  The 
form  of  AIC  Is  derived  In  the  univariate  model  with  varying  means  and 


00  ,  *2Tr»  1473  tOtTlOM  or  «  MOV  M  I*  OMOkSTt 

I/M  0103*  k*>  014*  M0I 


Unclassified 
ty 535PSSTnS5  SI 


Unclassified 


WTT  CLMamCATM*  OP  TWI  »*4S  fPOm  «M 


variances,  and  in  the  multivariate  model  with  varying  mean  vectors  and 
variance-covariance  matrices.  Numerical  examples  are  presented  and  results 
are  shown  to  demonstrate  the  utility  of  AIC  in  identifying  the  best 
clustering  alternatives. 
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